feat: Governance changes#118
Conversation
…d adapter wiring
Adds the runtime-side governance subsystem behind the
``EnablePythonGovernanceChecker`` feature flag. When the flag is off,
none of this is imported — the gate is at
``src/uipath/runtime/wrapper.py:apply_governance_wrapper`` and the
governance subtree stays off the startup path.
Architecture:
- ``src/uipath/runtime/governance/wrapper.py:GovernanceRuntime`` —
proxies the wrapped runtime, fires BEFORE_AGENT / AFTER_AGENT at the
runtime boundary, materialises the evaluator + framework adapter
lazily on the first hook fire. Step-isolated dispose; init
side-effects (model-name ContextVar, agent-type selector, prefetch)
all live behind the FF gate so an FF-off path is a true no-op.
- ``src/uipath/runtime/governance/native/`` — in-process policy
evaluator: policy fetch (size-bounded, currency-anchored amount
detection, expanded verb pattern for proposal/SOW commitments),
YAML-to-index compiler, bounded-pool compensating /runtime/govern
call, agent-type query param (conversational vs autonomous),
job-context payload (folder/job/process/agent/version keys).
- ``src/uipath/runtime/governance/audit/`` — pluggable sink framework
with a background-thread queue. Default sinks: traces (OTel spans,
always on, platform-mandated) and console (stderr, opt-in via
``UIPATH_GOVERNANCE_CONSOLE_LOG``). Sink failures circuit-break
after 10 consecutive errors; counters reset on re-register so a
fresh instance doesn't inherit a tripped state. ``close()`` shutdown
is bounded — ``put_nowait`` sentinel + ``_shutdown.set()`` signal so
a wedged sink can't hang process exit.
- ``src/uipath/runtime/governance/delegation_guard.py`` — async-aware
depth guard, patches both ``invoke`` and ``ainvoke`` with sync/async
wrappers matched via ``iscoroutinefunction``. Per-agent depths live
in a single module-level ``ContextVar[dict[id(agent), int]]`` —
ContextVars are interned by the interpreter and never GC'd, so the
prior one-ContextVar-per-agent design was an unbounded leak.
- ``src/uipath/runtime/governance/audit/traces.py`` — rule-level OTel
span surfaces matched non-allow actions as ``Status.ERROR``
(including audit-mode violations the runtime intentionally didn't
block). Hook spans stay UNSET; severity belongs on the rule that
fired.
- ``src/uipath/runtime/registry.py`` — ``UiPathWrappedRuntimeFactory``
wraps every registered factory so every runtime it produces passes
through ``apply_governance_wrapper``.
Notable contracts:
- ``_extract_governable_text`` (wrapper.py): pulls clean content out
of arbitrary runtime payloads. Walks dicts (priority keys: content /
text / output / answer / message / result / arguments / thinking),
list-of-blocks, pydantic models, dataclasses, plain objects. Cycle-
safe, depth-capped, 8000-char budget. Replaces the prior
``str(value)[:2000]`` shortcut that produced dict-repr garble.
- ``commitment_concern`` (``A.10.4``): OR semantics with currency-
anchored amount detection. Verb pattern covers first-person promise
verbs and proposal/SOW markers ("Cost: $X", "fixed scope",
"Deliverables", "Timeline: N days", "I propose"). Bare percentages
intentionally not matched — they false-positive on status text.
- Compensation payload: ``FiredRule`` TypedDict carries per-rule
metadata for LLMOps trace records; the validators list is derived
from it.
- Job-context resolution: memoized once per process via
``functools.lru_cache``; tests can invalidate via ``cache_clear``.
- Process-level governance state (conversational selector,
job-context cache) reset between tests via an autouse fixture in
``tests/conftest.py``.
Tests (225 passing):
- ``tests/test_evaluator.py`` — core evaluator + wrapper / adapter
integration via captured audit events.
- ``tests/test_commitment_concern.py`` — verb/amount/deadline OR
semantics, the proposal-style sample that originally slipped past
the rule, URL-fragment digits don't false-positive, percentage-only
status text stays silent.
- ``tests/test_delegation_guard.py`` — sync + async wrapper shapes,
shared depth counter across modes, leak fix (100 install/uninstall
cycles keep one ContextVar), multi-agent isolation.
- ``tests/test_dispose_isolation.py`` — each governance-side dispose
step survives upstream failures; delegate dispose still propagates.
- ``tests/test_text_extraction.py`` — dict / list-of-blocks /
pydantic / dataclass / cycle / budget cap.
- ``tests/test_audit_register_sink.py`` — sink failure counter reset
on re-register, duplicate register is a no-op, full lifecycle.
- ``tests/test_traces_severity.py`` — rule span ERROR for matched
non-allow, hook span stays UNSET regardless of final_action.
- ``tests/test_guardrail_compensation.py`` — compensating /govern call
payload, headers, URL composition, evaluator integration.
- ``tests/test_policy_agent_type.py`` — conversational vs autonomous
selector, policy URL query param.
- ``tests/test_registry.py`` — factory wrapping, governance attached
to every registered factory.
- ``tests/test_wrapper.py`` — FF gate + lazy-import contract.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR introduces a feature-flag–gated governance layer in uipath-runtime, wiring governance into runtime creation and execution while keeping governance imports and backend calls off the hot path when disabled. It also adds a native policy-fetch/compile pipeline, OpenTelemetry-based auditing, delegation-depth protection, and compensating /runtime/govern calls for disabled centralized guardrails, with comprehensive new test coverage and supporting documentation.
Changes:
- Added an FF-gated runtime wrapper entrypoint (
apply_governance_wrapper) and default factory wrapping (UiPathWrappedRuntimeFactory) to apply governance automatically. - Implemented native governance backend integration (policy fetch + YAML→index compilation), compensation calls, and supporting runtime state (enforcement mode, agent-type selector).
- Added audit sink framework (mandatory traces sink + optional console), improved delegation guard (sync+async), and extensive tests/docs/dependency updates.
Reviewed changes
Copilot reviewed 33 out of 34 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/test_wrapper.py | Tests FF-gated lazy import + fail-open behavior for wrapper application. |
| tests/test_traces_severity.py | Validates OTel span status semantics for hook vs rule spans. |
| tests/test_text_extraction.py | Tests structured payload text extraction for governance scanning. |
| tests/test_registry.py | Updates registry tests for wrapper application and adds wrapper-specific coverage. |
| tests/test_policy_agent_type.py | Tests conversational/autonomous agent-type selector and policy URL param behavior. |
| tests/test_guardrail_compensation.py | Tests compensation payload/headers/URL composition, error swallowing, and evaluator integration. |
| tests/test_evaluator.py | Tests enforcement-mode semantics, audit emission, and sink-failure isolation. |
| tests/test_dispose_isolation.py | Tests step-isolated cleanup semantics in GovernanceRuntime.dispose(). |
| tests/test_delegation_guard.py | Tests async-aware delegation depth guard, idempotency, and leak prevention. |
| tests/test_commitment_concern.py | Tests updated commitment concern detector semantics and regressions. |
| tests/test_audit_register_sink.py | Tests audit sink circuit-breaker counter reset behavior on register/unregister. |
| tests/conftest.py | Adds autouse fixture to reset governance process-level state between tests. |
| src/uipath/runtime/wrapper.py | Adds FF-gated apply_governance_wrapper with lazy import + fail-open behavior. |
| src/uipath/runtime/registry.py | Wraps factories by default to apply runtime wrappers; adds apply_wrappers escape hatch. |
| src/uipath/runtime/governance/wrapper.py | Adds GovernanceRuntime, text extraction, adapter attachment, and runtime-boundary checks. |
| src/uipath/runtime/governance/native/policy_api_client.py | Implements policy URL building and single-shot policy fetch + parsing. |
| src/uipath/runtime/governance/native/models.py | Adds native policy model types (Rule, Check, Condition, PolicyIndex, etc.). |
| src/uipath/runtime/governance/native/loader.py | Adds cached policy loading with background prefetch and fail-open behavior. |
| src/uipath/runtime/governance/native/guardrail_compensation.py | Adds bounded background compensation pool and /runtime/govern POST logic. |
| src/uipath/runtime/governance/native/backend_client.py | Centralizes URL composition, headers, org/tenant/job context resolution, and tunables. |
| src/uipath/runtime/governance/native/_yaml_to_index.py | Parses backend YAML into native PolicyIndex (skip-malformed, partial-pack tolerant). |
| src/uipath/runtime/governance/native/init.py | Exposes native governance evaluator/loader/model APIs. |
| src/uipath/runtime/governance/delegation_guard.py | Implements shared-ContextVar delegation depth guard for sync+async entrypoints. |
| src/uipath/runtime/governance/config.py | Adds runtime-level cached enforcement mode state/config. |
| src/uipath/runtime/governance/audit/traces.py | Adds OTel traces sink and span attribute/status semantics. |
| src/uipath/runtime/governance/audit/factory.py | Adds sink factory (traces/console). |
| src/uipath/runtime/governance/audit/console.py | Adds optional console sink output formatting/filtering. |
| src/uipath/runtime/governance/audit/base.py | Adds audit event model + async audit manager with bounded queue and circuit-breaker. |
| src/uipath/runtime/governance/audit/init.py | Exposes audit framework public API. |
| src/uipath/runtime/init.py | Re-exports governance integration entrypoints from runtime package. |
| pyproject.toml | Bumps uipath-core floor and adds native governance deps + typing overrides. |
| docs/runtime-wrapper-extension.md | Documents the governance integration point, FF gating, and testing approach. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| Failure mode is fail-open: when the organization id is unknown, the | ||
| access token is missing, the backend errors (one retry on transient | ||
| failures), or the body can't be parsed, the caller falls back to an | ||
| empty PolicyIndex. Nothing in this module ever raises to the caller. |
| headers = governance_request_headers(json_body=True) | ||
| headers[TENANT_HEADER] = tenant_id | ||
| logger.info("Policy fetch starting (org=%s, tenant=%s)", org_id, tenant_id) |
| for dumper in ("model_dump", "dict"): | ||
| fn = getattr(value, dumper, None) | ||
| if callable(fn): | ||
| try: | ||
| return _extract_governable_text( | ||
| fn(), budget=budget, seen=seen, depth=depth + 1, | ||
| ) | ||
| except Exception: # noqa: BLE001 - fall through to other extractors | ||
| break |
| event = _prefetch_event | ||
| if event is not None: | ||
| completed = event.wait(timeout=_PREFETCH_WAIT_SECONDS) | ||
| if completed and _policy_index is not None: | ||
| return _policy_index | ||
| logger.warning( | ||
| "Policy prefetch did not complete in %.1fs; " | ||
| "agent will run without any policies", | ||
| _PREFETCH_WAIT_SECONDS, | ||
| ) | ||
| _policy_index = PolicyIndex() | ||
| return _policy_index |
| if cond.operator == "guardrail_fallback" and isinstance( | ||
| cond.value, dict | ||
| ): | ||
| validator = str(cond.value.get("validator", "")) | ||
| if validator: |
| url = build_governance_url(org_id, GOVERN_API_PATH) | ||
| headers = governance_request_headers(json_body=True) | ||
| headers[TENANT_HEADER] = tenant_id | ||
|
|
| if self._async_mode: | ||
| # Wait for queue to drain | ||
| try: | ||
| self._queue.join() | ||
| except Exception: | ||
| pass |
…tiate audit vs enforce severity in traces
Text extraction (wrapper.py):
- Add "messages" to priority content keys for LangGraph-style state
({"messages": [...]}) so chat history leads the extracted blob.
- Walk lists newest-first so the latest message wins the budget when
the conversation grows.
- New latest_only flag (passed by BEFORE_AGENT) reduces the chat
history to the most recent message; flag resets on recursion so
multi-block content within that message is still walked fully.
- Raise text cap 8K -> 64K to fit multi-turn chat.
Trace severity (audit/traces.py):
- Differentiate "actually blocked" from "advisory" violations:
enforce-mode deny/escalate -> severity=ERROR + StatusCode.ERROR;
audit-mode (any action) or enforce-mode audit-action -> severity=
WARNING, Status left UNSET so the agent span isn't falsely marked
failed.
Tests cover reverse list walk, latest_only semantics, the 64K cap,
and the audit/enforce severity matrix.
Addresses all 7 Copilot review comments on PR #118 and switches the default enforcement mode so empty-policy tenants pay zero per-call audit overhead. PR-118 review comments: - policy_api_client docstring no longer claims "one retry on transient failures" — _get_once is and remains single-shot by design. - Policy fetch GET drops Content-Type: application/json (was sent via json_body=True). Strict origin servers can 415 on unexpected Content-Type for GETs; the helper's own docstring recommends omitting it on reads. - _extract_governable_text dumper loop now CONTINUES instead of BREAKS when model_dump() raises, so dict() is tried as documented ("fall through to other extractors"). - loader.get_policy_index distinguishes "prefetch did not complete in Xs" from "prefetch completed but produced no PolicyIndex" — prod triage can now tell a hung fetch from an auth / parse failure. - disabled_guardrails defensively re-checks mapped_to_uipath=True AND policy_enabled=False on every guardrail_fallback condition. Matches the function's docstring and protects against multi-condition rules or any future code path that bypasses the evaluator gate. - request_governance pre-checks UIPATH_ACCESS_TOKEN and skips when missing. Sending without a bearer guarantees a 401 per compensation call and pollutes logs; mirrors the org-id / tenant-id skip pattern already in place. - AuditManager.flush(timeout=...) now honors its timeout via a time.monotonic() poll loop and warns if drain doesn't complete. Previously called queue.Queue.join() with no timeout argument, allowing indefinite block — risky at process exit where _cleanup_audit_manager supplies a 2-second timeout that was being silently ignored. Default enforcement mode: - get_enforcement_mode default fallback flipped from AUDIT to DISABLED. The server-supplied mode (applied by the policy loader on every successful fetch) still wins; the env-var override still works. Empty-policy / failed-fetch / pre-fetch tenants now short-circuit at evaluator.py:332 with no _emit_audit call, no OTel spans, no AuditManager queue traffic. Previously these scenarios silently fell through to AUDIT and produced ~40 empty governance spans per turn for an N=10 LLM-call agent. Tests (245 passing, +7 new): - test_enforcement_mode_default.py pins the resolution order (programmatic > env > DISABLED default) and the invalid-env-falls-back-to-DISABLED behavior. - test_request_governance_skipped_when_token_missing pins the new bearer-token skip path. - _govern_env fixture now sets UIPATH_ACCESS_TOKEN; the headers test asserts the Authorization header is present (was a side-effect of the no-token test, which is now moved out). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|




TEST IN PROGRESS: PR that includes runtime changes for governance
Development Package